38 research outputs found
Compulsory Flow Q-Learning: an RL algorithm for robot navigation based on partial-policy and macro-states
Reinforcement Learning is carried out on-line, through trial-and-error interactions of the agent with the environment, which can be very time consuming when considering robots. In this paper we contribute a new learning algorithm, CFQ-Learning, which uses macro-states, a low-resolution discretisation of the state space, and a partial-policy to get around obstacles, both of them based on the complexity of the environment structure. The use of macro-states avoids convergence of algorithms, but can accelerate the learning process. In the other hand, partial-policies can guarantee that an agent fulfils its task, even through macro-state. Experiments show that the CFQ-Learning performs a good balance between policy quality and learning rate.Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES)GRICESFAPESPCNP
General detection model in cooperative multirobot localization
The cooperative multirobot localization problem consists in localizing each robot in a group within the same environment, when robots share information in order to improve localization accuracy. It can be achieved when a robot detects and identifies another one, and measures their relative distance. At this moment, both robots can use detection information to update their own poses beliefs. However some other useful information besides single detection between a pair of robots can be used to update robots poses beliefs as: propagation of a single detection for non participants robots, absence of detections and detection involving more than a pair of robots. A general detection model is proposed in order to aggregate all detection information, addressing the problem of updating poses beliefs in all situations depicted. Experimental results in simulated environment with groups of robots show that the proposed model improves localization accuracy when compared to conventional single detection multirobot localization.FAPESPCNP
Recommended from our members
Agents Teaching Agents: A Survey on Inter-agent Transfer Learning
Autonomous Agents and Multi-Agent Systems published a piece about the Inter-agent Transfer Learning in January 2020.Office of the VP for Researc
Markov decision processes for ad network optimization
In this paper we examine a central problem in a particular advertising\ud
scheme: we are concerned with matching marketing campaigns that produce\ud
advertisements (“ads”), to impressions — where “impression” is a general term\ud
for any space in the internet that can display an ad. In this paper we propose a\ud
new take on the problem by resorting to planning techniques based on Markov\ud
Decision Processes, and by resorting to plan generation techniques that have\ud
been developed in the AI literature. We present a detailed formulation of the\ud
Markov Decision Process approach and results of simulated experimentsAnna Helena Reali Costa and F ́ abio Gagliardi Cozman are partially supported by CNPq. Fl ́ avio Sales Truzzi is supported by CAPES. The work reported here has received sub- stantial support through FAPESP grant 2008/03995-5 and FAPESP grant 2011/19280-
Reinforcement Learning Applied to Trading Systems: A Survey
Financial domain tasks, such as trading in market exchanges, are challenging
and have long attracted researchers. The recent achievements and the consequent
notoriety of Reinforcement Learning (RL) have also increased its adoption in
trading tasks. RL uses a framework with well-established formal concepts, which
raises its attractiveness in learning profitable trading strategies. However,
RL use without due attention in the financial area can prevent new researchers
from following standards or failing to adopt relevant conceptual guidelines. In
this work, we embrace the seminal RL technical fundamentals, concepts, and
recommendations to perform a unified, theoretically-grounded examination and
comparison of previous research that could serve as a structuring guide for the
field of study. A selection of twenty-nine articles was reviewed under our
classification that considers RL's most common formulations and design patterns
from a large volume of available studies. This classification allowed for
precise inspection of the most relevant aspects regarding data input,
preprocessing, state and action composition, adopted RL techniques, evaluation
setups, and overall results. Our analysis approach organized around fundamental
RL concepts allowed for a clear identification of current system design best
practices, gaps that require further investigation, and promising research
opportunities. Finally, this review attempts to promote the development of this
field of study by facilitating researchers' commitment to standards adherence
and helping them to avoid straying away from the RL constructs' firm ground.Comment: 38 page
Realidade Virtual: Estereoscopia na Educação
Realidade virtual (RV) na educação é um tema fortemente presente nas instituições de pesquisas de vários países. Este artigo discute a aplicação de técnicas de RV, incluindo o uso da computação gráfi ca e a produção de vídeos tridimensionais a partir de equipamentos específi cos, porém de baixo custo para instituições de ensino. A estereoscopia atua como ponto chave para a visualização dessas aplicações. Para o desenvolvimento do projeto, são utilizados uma lente 3D, câmera doméstica, projetores de baixo custo, fi ltros de luz polarizados e óculos 3D passivo. O objetivo da produção do vídeo 3D foi o de avaliar desde os processos envolvidos na elaboração de roteiro, gravação e exibição, até os custos necessários para que uma instituição de ensino adote recursos de realidade virtual para o aprimoramento da aprendizagem
Speeding-up reinforcement learning through abstraction and transfer learning
We are interested in the following general question: is it pos-\ud
sible to abstract knowledge that is generated while learning\ud
the solution of a problem, so that this abstraction can ac-\ud
celerate the learning process? Moreover, is it possible to\ud
transfer and reuse the acquired abstract knowledge to ac-\ud
celerate the learning process for future similar tasks? We\ud
propose a framework for conducting simultaneously two lev-\ud
els of reinforcement learning, where an abstract policy is\ud
learned while learning of a concrete policy for the problem,\ud
such that both policies are refined through exploration and\ud
interaction of the agent with the environment. We explore\ud
abstraction both to accelerate the learning process for an op-\ud
timal concrete policy for the current problem, and to allow\ud
the application of the generated abstract policy in learning\ud
solutions for new problems. We report experiments in a\ud
robot navigation environment that show our framework to\ud
be effective in speeding up policy construction for practical\ud
problems and in generating abstractions that can be used to\ud
accelerate learning in new similar problems.This research was partially supported by FAPESP (2011/ 19280-8, 2012/02190-9, 2012/19627-0) and CNPq (311058/ 2011-6, 305395/2010-6
From Random to Informed Data Selection: A Diversity-Based Approach to Optimize Human Annotation and Few-Shot Learning
A major challenge in Natural Language Processing is obtaining annotated data
for supervised learning. An option is the use of crowdsourcing platforms for
data annotation. However, crowdsourcing introduces issues related to the
annotator's experience, consistency, and biases. An alternative is to use
zero-shot methods, which in turn have limitations compared to their few-shot or
fully supervised counterparts. Recent advancements driven by large language
models show potential, but struggle to adapt to specialized domains with
severely limited data. The most common approaches therefore involve the human
itself randomly annotating a set of datapoints to build initial datasets. But
randomly sampling data to be annotated is often inefficient as it ignores the
characteristics of the data and the specific needs of the model. The situation
worsens when working with imbalanced datasets, as random sampling tends to
heavily bias towards the majority classes, leading to excessive annotated data.
To address these issues, this paper contributes an automatic and informed data
selection architecture to build a small dataset for few-shot learning. Our
proposal minimizes the quantity and maximizes diversity of data selected for
human annotation, while improving model performance.Comment: Accepted at PROPOR 2024 - The 16th International Conference on
Computational Processing of Portugues
DEBACER: a method for slicing moderated debates
Subjects change frequently in moderated debates with several participants, such as in parliamentary sessions, electoral debates, and trials. Partitioning a debate into blocks with the same subject is essential for understanding. Often a moderator is responsible for defining when a new block begins so that the task of automatically partitioning a moderated debate can focus solely on the moderator's behavior. In this paper, we (i) propose a new algorithm, DEBACER, which partitions moderated debates; (ii) carry out a comparative study between conventional and BERTimbau pipelines; and (iii) validate DEBACER applying it to the minutes of the Assembly of the Republic of Portugal. Our results show the effectiveness of DEBACER.info:eu-repo/semantics/publishedVersio